Text categorization
نویسندگان
چکیده
منابع مشابه
Text Categorization
Text categorization is the task of assigning predefined categories to natural language text. With the widely used “bag-ofword” representation, previous researches usually assign a word with values that express whether this word appears in the document concerned or how frequently this word appears. Although these values are useful for text categorization, they have not fully expressed the abunda...
متن کاملText Categorization
Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. This task has several applications, including automated indexing of scientific articles according to predefined thesauri of technical terms, filing patents into patent directories, selective dissemination of information to info...
متن کاملEvaluating Text Categorization
While certain standard procedures are widely used for evaluating text retrieval systems and algorithms, the same is not true for text categorization. Omission of important data from reports is common and methods of measuring eeectiveness vary widely. This has made judging the relative merits of techniques for text categorization diicult and has disguised important research issues. In this paper...
متن کاملFragments and Text Categorization
We introduce two novel methods of text categorization in which documents are split into fragments. We conducted experiments on English, French and Czech. In all cases, the problems referred to a binary document classification. We find that both methods increase the accuracy of text categorization. For the Naı̈ve Bayes classifier this increase is significant. 1 Motivation In the process of automa...
متن کامل6 Text Categorization
During the last 15 years, the production of documents in digital form has exploded, due to the increased availability of hardware and software tools for generating digital data (e.g., personal computers, digital cameras, word processors) and for digitizing data that had been originated in nondigital form (e.g., scanners, OCR software). This phenomenon has also strongly affected “novel” digital ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Scholarpedia
سال: 2008
ISSN: 1941-6016
DOI: 10.4249/scholarpedia.4242